Search results for "Absent words"

showing 4 items of 4 documents

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)

researchProduct

Constructing Antidictionaries in Output-Sensitive Space

2021

A word $x$ that is absent from a word $y$ is called minimal if all its proper factors occur in $y$. Given a collection of $k$ words $y_1,y_2,\ldots,y_k$ over an alphabet $\Sigma$, we are asked to compute the set $\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{k}}$ of minimal absent words of length at most $\ell$ of word $y=y_1\#y_2\#\ldots\#y_k$, $\#\notin\Sigma$. In data compression, this corresponds to computing the antidictionary of $k$ documents. In bioinformatics, it corresponds to computing words that are absent from a genome of $k$ chromosomes. This computation generally requires $\Omega(n)$ space for $n=|y|$ using any of the plenty available $\mathcal{O}(n)$-time algorithms. This is because a…

FOS: Computer and information sciencesSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniOutput sensitive algorithmsString algorithmsPhysicsAntidictionarieSettore INF/01 - InformaticaOutput sensitive algorithm0102 computer and information sciencesAbsent wordsSpace (mathematics)01 natural sciencesAntidictionariesCombinatorics010201 computation theory & mathematicsTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITYData compressionComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Computer Science::Symbolic Computation[INFO]Computer Science [cs]Absent wordAlphabetWord (group theory)2019 Data Compression Conference (DCC)

researchProduct

Minimal Absent Words in Rooted and Unrooted Trees

2019

We extend the theory of minimal absent words to (rooted and unrooted) trees, having edges labeled by letters from an alphabet $\varSigma $ of cardinality $\sigma $. We show that the set $\text {MAW}(T)$ of minimal absent words of a rooted (resp. unrooted) tree T with n nodes has cardinality $O(n\sigma )$ (resp. $O(n^{2}\sigma )$), and we show that these bounds are realized. Then, we exhibit algorithms to compute all minimal absent words in a rooted (resp. unrooted) tree in output-sensitive time $O(n+|\text {MAW}(T)|)$ (resp. $O(n^{2}+|\text {MAW}(T)|)$ assuming an integer alphabet of size polynomial in n.

Polynomial (hyperelastic model)050101 languages & linguistics05 social sciencesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)02 engineering and technologyCombinatoricsTree (descriptive set theory)CardinalityInteger0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing0501 psychology and cognitive sciencesAlphabetMinimal Absent Words Rooted trees Unrooted Trees AlgorithmsNonlinear Sciences::Pattern Formation and SolitonsComputer Science::Formal Languages and Automata TheoryMathematics

researchProduct

Some Investigations on Similarity Measures Based on Absent Words

2019

In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset (x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that (x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common s…

sequence comparisonAlgebra and Number TheorySettore INF/01 - Informaticabusiness.industryComputer sciencePattern recognitionsimilarity measuresMinimal absent wordsTheoretical Computer ScienceComputational Theory and MathematicsSimilarity (network science)Artificial intelligencebusinessInformation SystemsFundamenta Informaticae

researchProduct